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Abstract 

The human brain has many remarkable information processing char- 
acteristics that deeply puzzle scientists and engineers. Among the most 
important and the most intriguing of these characteristics are the brain's 
broad universality as a learning system and its mysterious ability to dy- 
namically change (reconfigure) its behavior depending on a combinatorial 
number of different contexts. 

This paper discusses a class of hypothetically brain-like dynamically 
reconfigurable associative learning systems that shed light on the possible 
nature of these brain's properties. The systems are arranged on the general 
principle referred to as the concept of E-machine. 

The paper addresses the following questions: 

1. How can "dynamical" neural networks function as universal programmable 
"symbolic" machines? 

2. What kind of a universal programmable symbolic machine can form 
arbitrarily complex software in the process of programming similar to 
the process of biological associative learning? 

3. How can a universal learning machine dynamically reconfigure its soft- 
ware depending on a combinatorial number of possible contexts? 



* Accepted for publication in "Progress in Computer Science Research", Nova Science Pub- 
lishers, Inc. 
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The paper explains the concept of E-machine and outlines a broad range 
of its potential applications. These applications include: context-sensitive 
associative memory, context-dependent pattern classification, context-dependent 
motor control, imitation, simulation of complex "informal" environments 
and natural language. 

Introduction 

When observed "from the outside" the human brain seems to behave as a se- 
quential symbolic machine. How else can one explain such "clearly symbolic" 
phenomena as mental computations and natural language? When observed "from 
the inside," however, the neural networks of the brain evoke an idea of a noisy 
dynamical system with distributed parameters rather than the image of a logic 
circuitry of a digital computer - gradually changing potentials, decaying residual 
excitation, high level of fluctuations. Neurons do produce spikes reminiscent of 
the pulses in a digital computer. It is widely believed, however, that it is the 
frequency of these pulses rather than their presence and absence that carry the 
important information. 

1. How can "dynamical" neural networks function as universal programmable 
"symbolic" machines? 

2. What kind of a universal programmable symbolic machine can form arbitrar- 
ily complex software in the process of programming similar to the process of 
biological associative learning? 

3. How can a universal learning machine dynamically reconfigure its software 
depending on a combinatorial number of possible contexts? 

The metaphor "the brain as an E-machine" (Eliashberg, 1967, 1979, 1981, 
1989, 1990b) sheds light on these questions. The metaphor suggests that the 
brain is neither a traditional symbolic system, nor is it a traditional dynamical 
system. It is a "non-classical symbolic system" in which the probabilities of se- 
quential discrete ("symbolic") processes are controlled by the massively parallel 
continuous ("dynamical") processes. 

Note. The general idea that the brain employs a combination of symbolic and 
dynamical computational mechanisms was entertained in different forms by dif- 
ferent researchers (Collins and Quillian, 1972; Anderson, 1976; and many others.) 
The concept of E-machine is an attempt to provide a neurobiologically consistent 
formalization of this general idea. The requirement of neurobiological consistency 
makes a big difference! 
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The paper is divided into two parts. Part I consists of the three main sections: 

1. The Whole Human Brain as a Universal Lecirning Computer. This 
section takes a broader look at the problem of information processing in 
the whole human brain. It argues that there exists a relatively short for- 
mal representation of a universal learning computer similar to an untrained 
(unprogrammed) human brain. 

2. From Associative Neural Networks to E-machines. This section es- 
tablishes a link between associative neural networks and E-machines. It 
connects the effects of dynamic reconfiguration (neuromodulation) in neu- 
ral networks with the hypothetical states of dynamical memory available in 
individual neurons. These states of "residual-excitation-like" memory are 
referred to as the E-states. 

3. Moleculcir Interpretation of E-states: Ensembles of Protein Nanoma- 
chines as Statistical Mixed-signal Computers. This section addresses 
the problem of a neurobiological implementation of the E-sates and the next 
El-state procedures. It describes a formalism that connects the dynamics of 
macroscopic E-states with the statistical conformational dynamics of ensem- 
bles of protein molecules (such as ion channels) embedded in neural mem- 
branes. A single protein molecule is treated as a probabilistic nanomachine, 
and the E-states are interpreted as the average numbers of such nanoma- 
chines in different states - the average occupation numbers. The formal- 
ism suggests that it is the statistical conformational dynamics of protein 
molecules in individual neurons rather than the collective statistical dynam- 
ics of neural networks that performs the main volume of the brain hardware 
computations. There is not enough neurons in the whole human brain to 
implement the required amount of computations in the networks built from 
"simple neurons." 

Part II includes the following main sections: 

4. Computing with E-states. This section tackles the question as to how the 
massively parallel transformations of E-states allow a slow brain to efficiently 
process large arrays of symbolic data stored in its long-term memory (LTM) 
without moving this data into a read/write memory buffer. 

5. Hierarchical structure: sparse-recoding, data compression and sta- 
tistical filtering. This section explains how E-machines with hierarchi- 
cal structure of associative memory can perform efficient data compres- 
sion, context-dependent statistical filtering, and context-dependent gener- 
alization. 
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6. Discussion 

1 The Whole Human Brain as a Universal Learn- 
ing Computer 

This section takes a broader look at tlie problem of information processing in the 
whole human brain. It argues that there exists a relatively short formal represen- 
tation of a universal learning computer similar to an untrained (unprogrammed) 
human brain. 

1.1 System (Man, World) as a composition of two "ma- 



Consider a cognitive system (W,D,B) schematically shown in Figure ^ where W 
is an external world, D is a set of human-like sensory and motor devices, and B 
is a hypothetical computing system simulating the work of the human nervous 
system. One can think of system (D,B) as a human-like robot. From the system- 
theoretical viewpoint, it is useful to divide system (W,D,B) into two subsystems: 
(W,D) and B, where (W,D) is the external world as it appears to the brain B 
via devices D. In this representation, both subsystems can be treated as abstract 
"machines", the inputs of B being the outputs of (W,D) and vice versa. 

For the sake of simplicity, I refer to B as the brain. At this general level, the 
rest of the nervous system can be treated as a part of block D. Let B(t) denote 
the state of B at time t, where t=0 corresponds to the beginning of learning. I 
argue that the following general propositions are true: 



chines" 



External system 



Brain 



Sensorimotor devices 



W 



D 



B 





World 



Robot 



Figure 1: System (Robot, World) as a composition of two machines 
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1. There exists a relatively short formal representation of B(0). This represen- 
tation is encoded in the human genome and can be small enough to fit into 
a single floppy disk. 

2. No special mathematical formalism is needed to describe the work of B(0). 
Given a powerful enough hardware, a relatively small C or C++ program 
would be able to simulate the work of B(0) with a time step of, say, 1msec. 
This program would be sufficient to adequately represent all important psy- 
chological characteristics of B(0). A more complex, but still rather small C 
or C++ program would be able to simulate the work of B(0) with a time 
step of, say, Ifisec. This program would be sufficient to adequately rep- 
resent all important psychological characteristics of B(0) and many of its 
neurobiological characteristics. 

3. There exists a relatively short formal representation of the sensorimotor 
devices, D, since this representation is encoded in the human genome. The 
metaphorical floppy disk mentioned in item 1 has enough room for both 
B(0) and D. We know that B(0) can do well with different kinds of artificial 
devices, so the main secret is in B(0) rather than in D. 

4. In the general case, there exists no finite formal representation of system 
(W,D) - this system can be infinitely complex. This doesn't prevent one from 
simulating the behavior of system (W,D,B), because the "robot" (D,B) has 
a finite formal representation, and the external world, W, is "always there" 
to experiment with. 

5. Any formal representation of B(t) for a big t (say, t>10 years) must be 
very long (terabytes?) - this representation must include in some form 
a representation of the brain's individual experience which resulted from 
interaction with (W,D). Whatever language is used for the representation 
of B(t), the main part of this representation is the representation of the 
knowledge accumulated in the course of learning. Figuratively speaking, the 
human brain works as a " complexity sucker" that gets most if its complexity 
from system (W,D). 

6. The knowledge is represented in B(t) in a rather "raw" form - the brain's 
learning algorithm is close to "memorizing raw sensory- mot or- emotional ex- 
perience." No special data structures are needed. Instead of pre-processing 
data before putting it in memory, the brain uses a powerful massively par- 
allel decision-making procedure capable of processing the "raw" experience 
on the fly depending on context. 

7. It is practically impossible to understand B(t) without understanding B(0) 
and studying the process of learning that changes B(0) into B(t). 
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8. It is practically impossible to formally represent and simulate nontrivial 
parts of the behavior of system (W,D,B(t)) without having an adequate 
formal representation of B(t). That is, an adequate cognitive theory cannot 
be separated from the theory of the brain. 

9. The main goal of brain modelling must be reverse engineering B(0). This 
is a clearly defined and practically achievable goal. (1 refer to this re- 
verse engineering project as the Brain Zero or the Brain project. Visit 
www.brainO.com.) To advance toward this goal one should concentrate on 
the analysis of basic psychological and neurobiological observations rather 
than on the mimicking of the parts of the brain's behavior. The latter strat- 
egy leads one into the "new-effect-new-model" pitfall and is cursed by the 
combinatorial explosion of the number of partial models needed to represent 
the whole behavior. 

10. The role of B(0) in cognitive science can be meaningfully compared with the 
role of the Maxwell equations in the classical electrodynamics. The same 
Maxwell equations (a metaphorical counterpart of B(0)) coupled with an 
infinite variety of specific external constraints (a metaphorical counterpart 
of (W,D)) allow one to simulate infinite variety of specific classical electro- 
magnetic phenomena. Similarly, the same B(0) interacting with different 
external systems (W,D) would allow one to simulate, in principle, infinite 
variety of arbitrarily complex cognitive phenomena. 

1.2 The Maxwell equations metaphor: the pitfall of a " pure 
phenomenology" 

The example of physics warns us that one should not underestimate the power of 
simple basic mechanisms of Mother Nature. I argue that this warning is relevant 
to the problem of reverse engineering the "physical" system B(0). The brain is 
designed by Mother Nature - not by the human system engineers. This makes all 
the difference in the world. 

We (humans) design artificial information processing systems to make them 
easier to understand, test and debug. This costs us extra resources. In contrast. 
Mother Nature tends to solve natural design problems with minimum resources. It 
makes Her designs look clever. It also makes them difficult to understand. In such 
minimum-resource designs different functions are necessarily strongly integrated 
and cannot be easily structured as independent blocks. 

An integration of a set of simple physical principles can produce a "critical 
mass" effect. The introduction of the so-called "displacement current" in the 
Maxwell equations gives a classical example of this interesting phenomenon. All 
of a sudden, this simple addition to the set of known basic laws of electricity and 
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magnetism, allowed J.C. Maxwell to create his famous equations that cover the 
whole range of arbitrarily complex classical electromagnetic phenomena. 

I argue that something similar had happened in the case of the human brain. 
Not too much was needed to transform the brains of simple animals into the human 
brain. A clever integration of a relatively small set of powerful "basic mechanisms" 
produced a "critical mass" effect. 

To understand the pitfall of a "pure phenomenology" consider the following 
metaphor. Imagine a physicist who wants to simulate the behavior of electro- 
magnetic field in a complex microwave device, e.g., the Stanford Linear Accel- 
erator (SLAC). Assume that this physicist doesn't know about the existence of 
the Maxwell equations and, even more importantly, doesn't believe that the com- 
plex behavior he observes may have something to do with such simple equations. 
(In the Al jargon this physicist would be called "scruffy." If he believed in the 
existence of the basic equations he would be called "neat.") 

So this "scruffy physicist" sets out to do a purely phenomenological computer 
simulation of the observed complex behavior per se. Anyone who was involved 
in the computer simulation of the behavior of electromagnetic field in a linear 
accelerator can easily predict the results of this gcdanken experiment. 

In the best case scenario, the above mentioned scruffy physicist comes up with 
a computer program (with a large number of empirical parameters) capable of 
simulating the behavior of electromagnetic field in a very narrow range. This 
computer program has no extrapolating power and is not accepted by the SLAC 
community theory of a linear accelerator. 

Note that it would be impossible to reverse engineer the Maxwell equations (a 
metaphorical counterpart of B(0)) from the analysis of the behavior of electromag- 
netic field in such a complex "external world" as SLAC. I argue that, similarly, it 
is impossible to reverse engineer B(0) from the analysis of such complex cognitive 
phenomena in system (W,D,B(t)) as playing chess, solving complex mathematical 
problems, story telling, etc. 

1.3 Basic observations 

To formulate some "technical requirements" to an adequate model of B(0) con- 
sider the following basic observations: 

OBSERVATION 1. A person with a sufficiently large external memory aid 
(for example, a sheet of paper divided into squares) can perform, in principle, any 
effective computational procedure. A formalization of this observation had lead 
famous English mathematician Alan Turing (1936) to the invention of his cele- 
brated machine and to the corresponding formalization of the intuitive notion of 
an algorithm. (See Minsky, 1967, for a relevant discussion of Turing's ideas.) 
Now that the concept of an algorithm is defined, we can say that a model of 
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system (W,D,B), where W is and external memory aid, must be a universal com- 
puting system. (This is a necessary but, of course, not a sufficient, requirement.) 

OBSERVATION 2. We are not born with the knowledge of all possible al- 
gorithms. We can learn, however, to perform, in principle, any given algorithm, 
say, by simulating the work of a Turing machine representing this algorithm. 

This observation means that the above system (W,D,B) must be a universal 
learning system. 

OBSERVATION 3. A person with a good visual memory performing compu- 
tations with the use of an external memory aid learns to perform similar mental 
computations using the corresponding imaginary memory aid. A chess player 

learns to move chess pieces on an imaginary chess board. An abacus user learns 
to operate on an imaginary abacus (Baddeley, 1980). And so on. In principle, a 
person can learn to perform any mental computations by mentally simulating the 
process of writing symbols on a sheet of paper. 

Ignoring some severe, but theoretically unimportant limitations on the size of 
the working space available via this mechanism of mental imagery, this observa- 
tion suggests that the human brain, B, itself - not just a person with an external 
memory aid - must be treated by a system theorist as a universal learning system. 

Note. An adcqiiatc model of B(0) must have the highest general level of comput- 
ing power. Attempting to simulate the work of the human brain using a learning 
system with the general level of computing power lower than that of the brain can 
be compared with an attempt to design a Perpetual Motion machine in violation 
of the energy conservation law. No matter how sophisticated a learning process 
might be, no system can learn to do what it cannot do in principle. (An elephant 
learns to fly only in a Disney film.) 

OBSERVATION 4. We (humans) can imagine new sensory events and syn- 
thesize new motor reactions. At the same time we can remember and recall the 
real sequence of events (reactions). For example, an experienced chess player can 
mentally play any chess party. At the same time he/she can recall the real par- 
ties he/she played. Similarly, we can generate a combinatorial number of new 
sentences. At the same time we can read by heart a specific text we've learned. 

What kind of learning algorithm can accommodate these different types of learn- 
ing? Do we need different learning algorithms? 

OBSERVATION 5. We memorize new information with the references to the 
pieces of the information which we already have in our long-term memory (LTM). 
The more we know in a certain area the easier it is to remember new things related 
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to this area. For example, we can easily remember long sentences in the language 
wc know. It is next to impossible to remember long sentences in a language we 
don't know. It is also very difficult, for a second language speaker, to get rid of 
the accent, because he/she tends to build the words of the second language from 
the syllables of the first language. 

How can this hierarchical referencing system be implemented in neural net- 
works? 

OBSERVATION 6. Our ability to retain information in our short-term memory 
(STM) increases if similar information is present in our LTM. We can repeat a 
sentence in the language we know. We cannot repeat a sentence in a language we 
don't know. We can imitate only those reactions of other people that we can do 
ourselves. The same is true for perception. We have difficulties recognizing words 
of a foreign language that we cannot pronounce ourselves. 

What is STM? How does it interact with LTM? What is working memory? 
What does motor control have to do with it? 

OBSERVATION 7. To imagine difTcrcnt sensory events we need to do men- 
tal motor reactions that would cause similar events. We need to mentally sing a 
melody to imagine another person singing this melody. We need to mentally say 
a sentence to imagine another person saying this sentence. Etc. 

What is mental imagery? How does mental imagery interact with motor con- 
trol? 

OBSERVATION 8. We can see different sub-pictures in the same picture de- 
pending on what we expect to see. The Necker cube is an example. We can hear 
different tunes in the same sequence of sounds (e.g., the sounds produced by a 
moving train) depending on what we expect (want) to hear. 

What kind of mechanism available in neural networks can account for these 
phenomena of mental set? 

OBSERVATION 9. We can selectively tune our attention to a voice we want 

to hear in a noisy room - the so called cocktail party phenomenon. 

How can the brain temporarily increase sensitivity to signals with some not 
easily definable characteristics? 

OBSERVATION 10. Our short-term memory can retain only a limited number 
(seven plus or minus two) of items: the "magical number" of Miller, 1956. How- 
ever, due to the effect of "chunking" the size of a single item can be significantly 
increased. We also can "see more than we can report" ( Sperling, 1960) . This 
raises the same set of questions as the Observation 6. 
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OBSERVATION 11. The brain is a slow and noisy system. It cannot pro- 
cess symbolic information in a traditional ("classical") way by moving symbols 
in a read/write memory buffer. Nevertheless, wc can learn to mentally simulate 
different external systems (W,D) with the properties of a read/write memory. 
(For example, we can mentally move chess pieces on an imaginary chess board or 
mentally write and erase symbols on an imaginary sheet of paper.) 

How can computational universality in Turing's sense (Chomsky's type 0) be 
achieved without moving symbols in a read/write memory? How can neural net- 
works learn to simulate a symbolic read/write memory? 

Note. The problem of how the brain can learn to simulate an external system 
(W,D) with the properties of a read/write memory must not be confused with the 
problem of how a neural network can implement a read/write memory. The lat- 
ter problem is trivial. The former problem is nontrivial and critically important. 
Traditional neural network models cannot learn to simulate external systems with 
the properties of a read/write memory and, therefore, cannot serve as models of 
the brain's systems responsible for mental imagery. 

OBSERVATION 12. We can recognize that a certain object. A, is statistically 
strongly correlated with another object, B. We can also produce a reaction, R, 
statistically well correlated with a certain stiniTihis, S. Importantly, this statistical 
relationship depends on context. Two objects strongly correlated in one context 
may be not correlated at all in a different context. Our language has words usual, 
unusual, common, uncommon, etc., that reflect our ability to recognize statistical 
relationships. 

How can a huge amount of computations required for context-dependent sta- 
tistical processing be done "on the fly" by slow neural networks? (Note that it 
must be done "on the fly," because context can change very rapidly. This statis- 
tics cannot be precalculated, because there is a combinatorial number of possible 
contexts!) 

OBSERVATION 13. We can wait for a certain object, A. Once A appears 
we recognize that A is the object we were waiting for. If we expect a certain ob- 
ject, B, to appear and, instead, an unexpected object, C, appears we recognize 
that C is an unexpected object. We can answer the questions: "What are you 
waiting for? What do you expect?" 

How does the brain temporarily mark an object as an object "being waited for" 
or as an object "being expected?" 

OBSERVATION 14. Pattern recognition is a context-dependent activity. Con- 
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sider the question: "What is it?" In the context of this question a person behaves 
as a pattern classifier. He/she can answer, for example that this is a book. The 
person's brain was able to distinguish a book from other objects, say, a box, a 
disk drive, etc. Now consider the instruction: "Take this." In this context it is no 
longer important that the object has the name book. What is important is the 
object's size, weight, position, etc. The experience acquired while "taking a book" 
is applicable to "taking a box" and "taking a disk drive." That is, the same object 
is treated as a member of different classes depending on context. 

How can a context-dependent pattern classification be done "on the fly?" 

OBSERVATION 15. We can recognize our emotional states. We remember 
our emotional experience. We use this experience to evaluate new events. Our 
concepts of good, bad, important, unimportant, etc. are formed in the process of 
learning. 

How do we learn to recognize our emotions? How does our emotional memory 
interact with other types of memory? 

OBSERVATION 16. We can recognize internal states and internal reactions 
of other people. We can say, for example, "I know how you feel." We know that 
another person is thinking, waiting, etc. When we learn by imitating another per- 
son, we are not imitating this person as a black box. This means that the problem 
of learning cannot be formalized as the automata theory problem of one machine 
deciphering the structure of another machine observed as a black box. (If this 
formalization were true, we wouldn't be able to learn, in principle, a behavior of 
the Chomsky's type 2 and higher.) 

How do we learn to control our internal reactions? How do we learn the names 
of our internal reactions (thinking, imagining, recalling, waiting, seeing, listening, 
etc.)? How do we recognize similar internal reactions in other people? 

How does mental imagery interact with perception? 

OBSERVATION 17. Much of what we see we see from our memory. For 
example, when we are driving a car in a familiar environment we need only to 
glance at the scene to update the visual picture we expect. We can close our eyes 
and see the room we live in by mentally moving the eyes and mentally turning the 
head. 

How do the signals coming from external system (W,D) interact with the signals 
coming from memory? How is our mental imagery synchronized with the external 
system (W,D)? What does motor control have to do with it? 
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1.4 Motor control and mental imagery 



Let us expand the structure of system (W,D,B) of Figure Has shown in Figure |2l 
The brain B is divided into two blocks: AM and NM, where AM is an associative 
learning system that forms Sensory,Motor Motor (SM^M) associations, and 
NM is a set of motor centers. The diagram also depicts the block TEACHER. 
In this case, the teacher acts as an idealized neurophysiologists, who can produce 
any desired output of centers NM, by "clamping" these centers. System AM re- 
ceives sensory signals from system (W,D) and motor signals from the output of 
centers NM. This approach to teaching and learning is similar to the so-called su- 
pervised learning, except that, in our case, the learning system receives its sensory 
input from the external system (W,D) rather than from the teacher. This can be 
compared with the so-called instrumental conditioning. Let us make the further 
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Figure 3: Mental imagery as a simulation of the external system (W,D) 
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expansion of the structure of system (W,D,B) as shown in Figure El The brain 
B is now divided into four blocks: AS, AM, NS and NM, where blocks AM and 
NM are the same as in Figure 121 NS are sensory centers, and AS is an associative 
learning system that forms Motor, Sensory — > Sensory (MS-^S) associations. 

The goal of system AM is to simulate the block TEACHER. The goal of system 
AS is to simulate the external system (W,D). It is easy to see that system (W,D) 
plays the same role for system AS as the block TEACHER does for AM. We 
will view systems AM and AS as the systems responsible for motor control and 
mental imagery, respectively. We will view the sets of (SM— >-M) and (MS — ^S) 
associations as the brain's software associated with the above functions. 

1.5 Mental computations (thinking) as an interaction be- 
tween motor control and mental imagery 

A specific example of system (W,D,B) shown in Figure Ogives a simplified general 
explanation of the phenomenon of mental computations. The model was imple- 
mented as an educational program, called EROBOT, for the Microsoft Windows. 
(The program can be purchased from www.brainO.com.) An explicit description 
of this model was given in Eliashberg (2003). In Figure 0] 

• W is an external memory aid (the tape divided into squares). 

• D is a set of devices including the eye, the hand and the speech organ. 

• B is the brain divided into four blocks AM, AS, NM and NS that have the 
same general meaning as in Figure 01 

The robot's devices, D, allow it to simulate the work of any Turing machine by 
performing the following elementary operations: 

1. read a symbol from the single square scanned by the eye 

2. write a symbol into the scanned square 

3. move the eye and the hand simultaneously to the next square, the next 
square being the one to the left, the one to the right, or the same square 

4. utter a symbol to be kept in mind for one cycle - this one-cycle memory is 
provided by the delayed feedback between the motor signal, utter symbol, 
to the speech organ and the proprioceptive signal, symbol uttered, from this 
organ. 

An experiment with the model consists of two stages: training and examination. 
At the stage of training the teacher forces the robot (by acting on its motor 
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Figure 4: Mental computations as interaction between motor control and mental 
imagery 

centers) to perform several examples of a specified algorithm with different input 

data presented on tape. (The parenthesis checker algorithm borrowed from Minsky 
(1967) is used as a built-in example in the program EROBOT.) 
The following results of learning are achieved: 

1. System AM learns to simulate the teacher, so the robot can perform the 
demonstrated algorithm with any input data without the help of the teacher. 

2. In the case of a finite tape, and a sufficient number of training examples, sys- 
tem AS learns to simulate the external system (W,D). Accordingly, the robot 
learns to perform the demonstrated algorithm with the use of an imaginary 
memory aid. (The robot keeps writing symbols on the real tape to show 
what it calculates on the imaginary tape. The robot doesn't see the real 
tape!) 
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1.6 The pitfall of a "smart" learning algorithm 

The main part of today's research in learning is devoted to the development and 
study of what can be referred to as "smart" learning algorithms. Such algo- 
rithms attempt to create "optimal" representations of the learner's experience in 
the learner's memory. I argue that this general approach (whatever interesting 
and important from the engineering and mathematical viewpoints) cannot be em- 
ployed by a universal learning system similar to the human brain. The catch is 
that a smart learning algorithm aimed at a "single-context" optimization is not 
universal. While optimizing performance in a selected context, it throws away a 
lot of information needed in a variety of other contexts. 

Consider, for example. Observation 14 form Section 1.3. This observation sug- 
gests that, in the case of the human brain, there is no such thing as an optimal 
context- independent classification. The main issue is not "how" to pre-process 
information in the course of learning (Hebbian learning, backpropagation, simu- 
lated annealing, etc.), and how to store this pre-processed information in memory 
(distributed, local, synaptic, optical, etc.), but "what" information to learn. The 
human concepts of "good", "bad", "important", and "unimportant" change with 
experience. Therefore, a " smart" learning algorithm with a fixed criterion of opti- 
mality - the criterion that is not affected by the contents of data - cannot serve as 
an adequate metaphor for human learning. What seems unimportant today may 
become important tomorrow when new information is acquired. 

I argue that a really smart universal learning system - such as B(0) - must use 
a "dumb" but universal learning algorithm. Instead of doing much pre-processing 
of data before placing it in memory, such system must use an efficient decision- 
making (data interpretation) procedure to process "raw experience" dynamically 
(on the fly) depending on context. Theoretically, a powerful enough interpretation 
procedure can always make up for a "dumb" learning algorithm as long as this 
algorithm doesn't lose data. In contrast, no decision making procedure can make 
up for a "smart" learning algorithm that throws away a lot of information. The 
loss of data is irremediable. 

2 From Associative Neural Networks to E-machines 

This section introduces the concept of a primitive E-machine (Eliashberg, 1979) 
as a natural information processing extension of the notion of a homogeneous 
associative neural network. A complex E-machine is a system built from several 
primitive E-machines. Complex E-machines will be discussed in Part II of this 
paper. 
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2.1 Simple example of associative neural network: Model 
ANN-0 

Consider a neural network schematically shown in Figure The functional model 
of this network described in this section will be referred to as Model ANN-0 
(Associative Neural Network ^ 0). 

In Figure El large circles with incoming and outgoing lines represent neurons 
with their dendrites and axons, respectively. Small white and black circles repre- 
sent excitatory and inhibitory synapses, respectively. The network has three layers 
of neurons: input neurons Nl, intermediate neurons N2, and output neurons N3. 
Neurons N2 have a global inhibitory feedback via neuron N4 and local excitatory 
feedbacks. It will be shown that in this network neurons N2 can compete via 
reciprocal inhibition in the winner-take- all fashion. A similar effect can be ob- 
tained in a network with lateral inhibitory feedbacks. Figure El uses the following 
notation: 

• Nk[j] is the j-th neuron from set Nk. 

• Smk[i,j] is the synapse between neuron Nk[j] and neuron Nm[i\. 

• Xj is the output of neuron A^l[j]. 

• gfj is the gain of synapse S21[ij]. 




Teacher 



Figure 5: Simple example of associative neural network 
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• Si is the net synaptic current of synapses S21[i, 1], . . . S21[i, m] - Si represents 
a similarity between input vector x and vector gf (expression (|T|)). 

• Tj is the output of neuron N2[i]. 

• q is the output of neuron N4. This output is the sum of the feedback signal 
PY^Ti and an external signal Xmh- 

• /? is the gain of synapse between any neuron from N2 and neuron N4. 

• r is the time constant of any neuron from N2. 

• a is the gain of synapse providing local excitatory feedback for a neuron 
from N2. 

• gl- is the gain of synapse between neuron N2[i] and neuron A^3[A;]. 

The following functional model of the network of FigureElwas studied in Eliashberg 
(1967, 1979). In this model a neuron is treated as a linear threshold element with 
zero threshold and the time constant r. In spite of its simplicity, this model 
has a significant educational value because it allows one to explicitly bridge the 
gap between its neurobiological and psychological theories and to show what kind 
of mathematics is involved in this bridging. No learning algorithm is described, 
and it is assumed that the model is preprogrammed before the beginning of an 
experiment. 



Xj 



r— + Ui = Si + a ■ Ti- q (2) 
at 



if > 
otherwise 



q = I3^ri + Xinh (4) 



yk = Y^ 9li ■ n (5) 

i=l 

Let all Xj and Xmh (and, therefore, all Si) be step functions of time. Then, for 
all active neurons from layer N2 - the neurons for which > - the solution of 
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equations (0)^0 can be represented in the following explicit form: 



u, = ^ e"* - 1 + - <Je"* 

a — 1 



(6) 



bt\ I „,0 -bt 



I + P ■ ni — a 
where 

• ni is the number of active neurons from N2. 

• [i = 1, . . .n) are the values of Uj at t=0. 

• Sav and are the average values of Si and u° for all active neurons from 
N2. 

^ ni 

■Sari 



ni 

1=1 



i=l 



Parameters a and 6 in e"* and e are as follows: 

a = {a- l)/r (9) 



b= {l + p.ni-a)/T (10) 

Let 1 < a < 1 + /3. Then a > 0. According to expression (jH)), neurons N2[i] 
with Sj > Sav increase their potentials Ui. Neurons N2[i] with Sj < Sav decrease 
their potentials and switch off once Ui < 0. This reduces rii and increases Sav 
making Sj < Sav for some additional neurons from N2. Eventually, only neurons 
with Si = max{si, . . . s„) will have Ui > 0. It can be shown that this equilibrium is 
unstable if nj > 1. Therefore, in the presence of noise, at the end of the transient 
response there will be only one winner randomly selected from the set of neurons 
with the maximum level of Si. 

2.2 Model ANN-0 as a symbolic machine 

Let us introduce a finite ("psychological") time step At ^ r, and let us assume 
that inputs change step- wise at moments t^, and + At/2, where 

ty = u-At z/ = 0,l,... (11) 
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fort e + At/2] 



otherwise 



(12) 



Let us introduce a periodic inhibition 








inh 



otherwise 



if t G {U,U + At/2] 



(13) 



Let us sample outputs at the end of the first half of each cycle 



(14) 



Let us assume that the the states of ILTM and OLTM are specified at the begin- 
ning of an experiment with the model and don't change during the experiment 
(the model is preprogrammed in advance and no learning takes place during the 
experiment). Let us also assume that the parameters of the model are the same 
for all experiments. To describe the "psychological" properties of Model ANN-0 
we need the following system theoretical concepts. 

DEFINITIONS: 

• A (deterministic) combinatorial machine is a system M=(X,Y,f), where 
X and Y are finite sets of symbols, called the input and the output set 
(or alphabet) of M, respectively; / : X ^ Y is the output function of M. 
Machine M works as follows: i/i, = f{xy), where Xj, G X and G Y are the 
input and the output symbols at the z/-th cycle. 

• A probabilistic combinatorial machine is a system M=(X,Y,5), where X 
and Y are the same as above; 5 : X x Y — » [0, 1] is the function of output 
conditional probabilities of M. Machine M works as follows: 

P{yv = h \ Xy = a} = 5{a,h), where Xj,, a G X and yy,h eY and P{B \ A} 
is the conditional probability of B given A. 

• Machine Ml simulates [is equivalent to) machine M2 if these two machines 
cannot be distinguished from each other by observing their inputs and out- 
puts (observing them as black boxes). 

The following properties of Model ANN-0 - with the inputs and outputs described 
by expressions HU 1121 Cni CH^ can be proved (Eliashberg, 1979): 

1. Let X and Y be finite subsets of the sets of input an output vectors of the 
model, respectively. Let x(z/) G X and |/(z/) G Y. Let : X x X ^ R 
be the similarity function from expression (P)- in this case is the scalar 
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product. Let the pair (X, /*) satisfy the following correct decoding condition 



Vx, x' eX. (if x 7^ x' then f^{x, x') < /"{x, x)) 



(15) 



For any combinatorial machine M = (X, Y, /) there exists a state, g, of the 
LTM of the model (and some fixed values of parameters of the model) such 
that the model in the state g simulates (is equivalent to) machine M. 

2. The previous result extends to any probabilistic combinatorial machine (X,Y,5) 
with rational probabilities S. 

2.3 Model AF-0: A trivial primitive E-machine correspond- 



The "psychological" properties of Model ANN-0 can be described in algorithmic 
terms. The description presented below gives an example of a trivial primitive 
E-machine - a primitive E-machine without E-statcs. This model will be referred 
to as Model AF-0 (Associative Field # 0). 

Notation 

In this paper 1 use a C-like notation mixed with scientific-like notation to represent 
models of E-machines aimed at humans. (1 use C++ for computer simulation.) I 
use special notation for the following operations: 

• A := {a\P{a)} select the set of elements a with the property P{a). I 
use Pascal-hke notation to emphasize the dynamic character of this 
operation. 

• a : e A select an element a from the set A at random with equal probability. 

DECODING: compare input vector with all vectors in Input LTM 
for{i — l;i<—n;i-\--\-) s[i] — Similarity{x[*], gx[*][i]); (1) 
CHOICE: select the set of locations with the maximum value of s[i] 



ing to Model ANN-0 



MAXSET := {i \ s\i] = max{s[l], . . . s[n])}; 



(2) 



randomly select a winner (win) from MAXSET 



win :e MAXSET] 



(3) 
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ENCODING: read output vector from the selected location, win, of Out- 
put LTM 

if{s[win]>xinh) y[*] = gy[*][win]] else y[*] = NULL] (4) 
Comments: 

1. As long as the Similarity () function and the set of allowable inputs, X, sat- 
isfy the correct decoding condition (expression (fT3j) with = Similarity), 
Model AF-0 is a system universal with respect to the class of combinatorial 
machines. 

2. The psychological Model AF-0 is much simpler than the neurobiological 
model ANN-0. Model AF-0 doesn't have all the neural-implementation- 
parameters of model ANN-0. It also doesn't have the fast changing (neuro- 
biological) state u. 

3. In the next sections Model AF-0 will be enhanced in several directions. 

(a) Adding a one-cycle delayed feedback from y to x. This will change 
model AF-0 into a system universal with respect to the class of state 
machines. 

(b) Adding a universal learning algorithm. The new model will become a 
learning system universal with respect to the class of finite-state ma- 
chines. 

(c) Introducing Estate arrays, a next E-state procedure, and a Structural 
LTM (SLTM). This will transform Model AF-0 into a nontrivial primi- 
tive E-machine capable of producing some interesting effects of working 
memory and temporal context (mental set). 

(d) Introducing associative inputs and outputs. This enhancement will al- 
low us to get effects of sparse re-coding, data compression and context- 
dependent statistical filtering. 

2.4 Delayed feedback and simulation of finite-state ma- 
chines 

DEFINITION 

A (deterministic) finite-state machine is a system M = (X, Y, S, a, cj), where X 
and Y are finite sets of external symbols of M called the input and the output sets 
(alphabets), respectively, S is a finite set of internal symbols of M called the state 
set, u;:XxS^Yisa function called the output function of M, a : X x S ^ S is 
a function called the next-state function of M. The work of machine M is described 
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M 




Figure 6: Finite-state machine as a combinatorial machine with a one-cycle de- 
layed feedback 

by the following expressions: s^+i = a{x^, s^,), and y^, = uj{x,y, s,,), where a; G X, 
y E Y, and s G S are the values of input, output, and state variables at the 
moment z/, respectively. 

Note. There are different equivalent formalizations of the concept of a finite- 
state machine. The formalization described above is known as a Mealy machine. 
Another popular formalization is a Moore machine. In a Moore machine the 
output is described as a function of the next-state. Practical electronic designers 
usually use the term state machine instead of the term finite-state machine. Any 
finite-state machine can be implemented as a combinatorial machine with a one 
cycle delayed feedback (see Figure ^ . Using this trick, it is easy to show that 
Model AF-0 with a delayed feedback can simulate any finite-state machine. 

2.5 Introducing a universal learning algorithm 

Let us return to the system (W,D,B) shown in Figure El Simple as it is. Model 
AF-0 has enough computing power to serve as the motor control system AM, be- 
cause the one-cycle delayed feedback "utter-symbol— symbol- uttered" transforms 
block AM into a system universal with respect to the class of finite machines (as 
explained in the previous section). This gives the system (W,D,B) the power of 
a universal Turing machine. (A Turing machine is a finite-state machine coupled 
with an external tape through the I/O device called the head. The block (W,D) 
provides the functionality of the tape and the head of this machine.) 

What kind of learning algorithm does the Model AF-0 need to be able to learn 
to simulate any combinatorial machine? 
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It is easy to show that the simplest algorithm satisfying this requirement is "tape- 
recording" the X- sequence and the Y-sequence in the Input and Output LTM, 
respectively. In the case of a deterministic combinatorial machine, this algorithm 
can be improved by recording only new associations. In the case of a probabilistic 
combinatorial machine the same associations need to be recorded several times to 
accumulate statistics. 

In phenomenological terms, the above tape recording algorithm can be de- 
scribed as follows: 

BOOL wen; //write enable: auxiliary input variable 
int wptr; / / write pointer: auxiliary state variable 

if (wen) {gx[*][wptr] = x[*]; gy[*][wptr] = y[*]; wptr + +;} (5) 

It is interesting to mention that some famous psychiatrists were advocating 
this concept of tape-recording-learning. Here is a quotation from Meynert (1884): 
"Each new impression meets a new, still vacant cell. With the existence of such 
vast number of these vacant cells, impressions arriving in succession find carriers 
in which they will remain forever in the same close order". 

As mentioned in II. 6[ the concept of a " smart" learning algorithm creates a 
methodological pitfall. The catch is that the human concept of important infor- 
mation changes with experience, so no learning algorithm with a fixed criterion of 
optimality can be smart enough to know in advance which information is impor- 
tant to store and which is not. What seems unimportant today may become very 
important tomorrow where more information is acquired. 

I argue that there is no special magic in how the knowledge is stored in the 
brain (distributed, local, analog, digital, etc.). The magic is in what knowledge is 
stored and how this knowledge is processed dynamically depending on context. 

2.6 "Symbolic" or "nonsymbolic," that is the question 

Starting with the neural network shown in Figure one can proceed in two 
different directions: 

1. When the neurons in layer N2 compete in a winner-take- all fashion (1 < a < 
1 + /?), the Model ANN-0 can be thought of as a neural counterpart of the 
Programmable Logic Array (PLA) shown in Figure [7| The input synaptic 
matrix (Input LTM) is similar to the programmable AND-array, and the 
output synaptic matrix (Output LTM) is similar to the programmable OR- 
array. If one goes in this direction one gets some "neural extras," such 
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Figure 7: Programmable Logic Array (PLA) 



as a generalization by similarity and the ability to simulate probabilistic 
combinatorial machines. Taking this path, eventually, brings one to the 
concept of a primitive E-machine. 

2. If one reduces the competition of neurons in layer N2, one enters the realm 
of connectionist neural networks. Let us set a = f3 = 0. Let us also replace 
the linear threshold output function by a sigmoid function. Model ANN-0 
becomes a typical Parallel Distributed Processing (PDP) system. In the tra- 
ditional connectionist graphical representation, this system looks like shown 
in Figure IHl 




Figure 8: Associative neural network as a "connectionist" system 
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If one selects this " nonsymbolic" path, one is inspired to view neural net- 
works as analog computational devices implementing multidimensional map- 
pings / : R™ —>■ R", where R is the set of real numbers. It is seldom possible 
to find the weights, corresponding to nontrivial multidimensional mappings, 
analytically. Therefore the development and study of the learning algo- 
rithms, automatically adjusting the weights, becomes the main thrust of 
this research. (There is plenty of room in multidimensional real spaces, so 
one can spent one's life searching for the "magical neural mappings.") 

Which way to go? I argue that the first direction is the right way to go if one 
is interested in biological brain. The second approach has the following liabilities 
(each of which is sufficient to disqualify this approach as an adequate biological 
framework) : 

1. The learning algorithms used in PDP models (such as backpropagation, 
simulated annealing, etc.) are not universal. (See Rumelhart, McClelland, 
et al (1986) for the explanation of the PDP framework.) 

2. Traditional PDP models don't have a sufficient general level of computing 
power to adequately address such critically important "symbolic" problems 
as the problem of natural language. (See Pinker and Mehler (1988) for a 
discussion of this issue.) 

3. PDP models provide no satisfactory explanation of the phenomena of work- 
ing memory and mental set. They are largely inconsistent with Observations 
1-17 from Section 1.5. 

4. Biological neural networks don't have the accuracy needed to implement 
traditional PDP algorithms. 

5. Traditional PDP models have no room to accommodate the known com- 
plexity of biological neurons. The whole vision of the brain as a collective- 
distributed- dynamical system built from simple "atomic" neurons is incon- 
sistent with the modern neurobiological data (Kandel and Spenser, 1968; 
Kandel, Jessel, and Schwartz, 2000; Nichols, Martin, Wallace, 1992, Byrne, 
1987). A single neuron is a complex integrated computing element. The 
brain has many different types of neurons tailored for different tasks. 

2.7 Introducing E-states: Model AF-1 

The basic architecture of Model AF-1 is shown in Figure IHl As compared with 
Model AF-0, this model has two additional procedures: BIAS and NEXT E- 
STATE PROCEDURE. Both these procedures are included in the block EXCI- 
TATION. 
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DECODING: compare input vector with all vectors in Input LTM 

for{i = l;i<=n;i + +) s[i] = Similarity{x[*], gx[*][i]); (1) 

BIAS: calculate biased similarity. Coefficients a and b determine, respectively, 
the additive and the multiplicative biassing effect of the "residual excitation" e[i]. 

for{i = 1; i <= n; i + +) se[i] = s[i] + a* e[i] + b* s[i] * e[i]); (2) 

CHOICE: select the set of locations with the maximum value of se[i] 

MAX SET :- {i \ se\i] = max{se[l], . . . se[n])}; (3) 

randomly select a winner (win) from MAXSET 

win :e MAXSET; (4) 

ENCODING: read output vector from the selected location, win, of Out- 
put LTM 

if{s[win\ > xinh) y[*\ — gy[*\[win\] else y[*\ — NULL] (5) 
NEXT E-STATE PROCEDURE: calculate next E-state 



x[*] 



gx[*][l] 



gx[*][i] 



s[l] 
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(STM andlTM) 



OUTPUT 
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gy[*][n] 



ENCODING 
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Figure 9: The general architecture of Model AF-1 
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for{i = l;i <= n;i + +) (6) 
if{s[i] > e[i]) e[i] = s[i]; //instant charge 

else e[i] = e[i] * {tau — l)/tau\ / /discharge with the time constant tau 

LEARNING: calculate next state of LTM (G-state) 

if {wen) {gx[*\[wptr] = x[*\] 
gy\*\\wptr] = y[*\] wptr + +]} (7) 



For the sake of concreteness let us define the following similarity function: 

float Similarity {int * x, int * g) (8) 
{ 

float s] 

int j, k; 

s = 0; k = 0; 

for{j = l;j <= m;j + +) 

{^f{x[j] == g[j] && x[j] ! = 0) S + +; 

tf{x[j]\= 0) k + +;} 

if{k > 0) s / = k; else s = 0; 

return s; 

} 

Note. The Similarity() is equal to the number of non-zero matches {x[j] = 
g[j] 7^ 0) divided by the number of non-zero components of input vector {x[j] ^ 0). 
Many other similarity functions, satisfying the correct decoding condition - Sec- 
tion 2.2, Expression (fT^ - would work as well. 

2.8 Dynamic reconfiguration: "many symbolic machines 
in one" 

Model AF-1 uses a very simple mechanism of EXCITATION (expressions (2) and 
(6)). This simple mechanism is sufficient to illustrate some important effect asso- 
ciated with the introduction of E-states. 

Terminology. The pair (gx[*][*],gy[*][*]) will be called the program or the table 
of associations of Model AF-1. The pair (gx[*][i],gy[*][i]) will be called the i — th 
command (the i — th association) of the program (the table of associations). The 
number of commands in a program will be called the length of the program. 
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It is easy to prove the following result: 

Let C(X,Y) be a class of combinatorial machines with the input alphabet 
X and the output alphabet Y. Model AF-1 with a fixed program of the length 
|X| ■ |Y| (or greater) can be changed (reconfigured) into any machine from class 
C(X,Y) by changing its E-state, e[*]. 

Proof. Let the program (fiia:; [*][*], 5'?/ [*][*]) contain at least once each pair 
from XxY, and let N(M) be the subset of locations containing all commands 
of a combinatorial machine M from the above class. Let r ^ 1. Let e[i] = 1 if 
i e N(M), and e[i] = otherwise. Model AF-1 with this program and this E-state 
will simulate machine M. 

This result illustrates the importance of E-states. Model AF-0 with a program 
of the length |X| can simulate a single combinatorial machine from C(X,Y). To 
simulate a different machine, this model must be reprogrammed. Model AF-1 
with a program of the length |X| • |Y| can simulate any machine from the above 
class without reprogramming. 

Example. Let C(X,Y) be the class of all logic functions with m inputs and 
one output, that is, X = {0, 1}™ and Y = {0, 1}. Model AF-1 with a fixed 
program of the length 2m can be reconfigured into any of the 2^ possible logic 
functions, where N — 2^. 

Why is it better to reconfigure than to reprogram? This critically important 
question will be discussed in Part II of this paper. 

3 Molecular Interpretation of E-states: Ensem- 
bles of Protein Nanomachines as Statistical 
Mixed-signal Computers 

What can be a meaningful neurobiological interpretation of the phenomenological 
E-states? How can nontrivial next E-state procedures be implemented in neural 

networks? 

This section presents a formalism that offers an answer to these questions. 
The formalism can be viewed as a system theoretical extrapolation of the main 
idea of the Hodgkin and Huxley (1952) theory that the sodium and potassium 
ion channels, embedded in the axon membrane, work as stochastic switches with 
several internal states (conformations). The formalism was discussed in Eliashberg 
(1989, 1990a, and 2003). 
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3.1 Concept of protein molecule machine (PMM) 

DEFINITION. A Protein Molecule Machine (PMM) is an abstract probabilistic 
computing system (X, Y, S, a, a;), where 

• X and Y are the sets of real input and output vectors, respectively 

• S= {sqi ••s„-i} is a finite set of states 

• aiXxSxS^R'isa function describing the input-dependent conditional 
probability densities of state transitions, where a{x, Si, Sj)dt is the condi- 
tional probability of transfer from state sj to state Si during time interval 
dt, where x e X is the value of input, and R' is the set of non-negative 
real numbers. The components of x are called generalized potentials. They 
can be interpreted as membrane potential, and concentrations of different 
neurotransmitters . 

• a;:XxS— >^Yisa function describing output. The components of y e Y 
are called generalized currents. They can be interpreted as ion currents, and 

the flows of second messengers. 

Let xGX, yGY, sGS be, respectively, the values of input, output, and state at 
time t, and let Pj be the probability that s = s,. The work of a PMM is described 
as follows: 




(1) 



at t 








(2) 



i=0 




(3) 



dpji=a(x,Sj,Sj)Pidt 




S=S: 



dp-j= a(x,Si,Sj)Pjdt 



Figure 10: Protein molecule as a probabilistic nanomachine 
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Summing the right and the left parts of ((T)) over i = 0, ..n — 1 yields 







(4) 



dt 



so the condition (j21) holds for any t. 

The internal structure of a PMM is shown in Figure where dpij is the proba- 
bility of transition from state Sj to state Si during time interval dt. The output 
y = Lj{x, s) is a function of input and the current state. 
For the probability of transition from state Sj to state Sj we have 



3.2 Example: Voltage-Gated Ion Channel as a PMM 



Ion channels are studied by many different disciplines: biophysics, protein chem- 
istry, molecular genetics, cell biology and others (see Hille, 2001). I am concerned 
with the information processing (computational) possibilities of ion channels. 

I postulate that, at the information processing level, ion channels (as well as 
some other membrane proteins) can be treated as PMMs. That is, at this level, 
the exact biophysical and biochemical mechanisms are not important. What is 
important are the properties of ion channels as abstract machines. 

This situation can be meaningfully compared with the general relationship be- 
tween statistical physics and thermodynamics. Only some properties of molecules 
of a gas (e.g., the number of degrees of freedom) are important at the level of 
thermodynamics. Similarly, only some properties of protein molecules are impor- 
tant at the level of statistical computations implemented by the ensembles of such 
molecules. 

The general structure of a voltage-gated ion channel is shown schematically 
in Figure ITTk . Figures ITTb and ITTfc show how this channel can be represented as 
a PMM. In this example the PMM has five states s G {0, 1, ..4}, a single input 
X = V (the membrane potential) and a single output y = I (the ion current). 

Using the Goldman-Hodgkin-Katz (GHK) current equation we have the fol- 
lowing expression for the output function lj{x, s). 




(5) 



It follows from that 




(6) 





where 
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Figure 11: Ion channel PMM 



• is the ion current in state s = j with input x = V 

• pj [cm/ sec] is the permeability of the channel in state s = j 

• 2; is the valence of the ion {z = 1 for and Na^, 2; = 2 for Ca+^) 

• F = 9.6484 ■ 10^ [C/mol] is the Faraday constant 

» V' = is the ratio of membrane potential to the thermodynamic potential, 
where T [K] is the absolute temperature, and R = 8.3144 [J/K ■ mol] is the 
gas constant 

• C^^ and C°"* [mol] are the cytoplasmic and extracellular concentrations of 
the ion, respectively 

One can make different assumptions about the function a{x, Sj, Si), describ- 
ing the conditional probability densities of state transitions. It is convenient to 
represent this function as a matrix of voltage dependent coefficients aij{V). 

/ aoo{V) .. aoj{V) .. ao»n(^) \ 
a = aio{V) .. aijiy) .. aimiV) 

\ amo{V) .. amj{V) .. {V)J 

where m = n — 1. Note that the diagonal elements of this matrix are not used in 
equation ((T)). 



31 



In the model of spike generation discussed in Eliashberg (1990a) both sodium, 
Na~^, and potassium, channels were treated as PMMs with five states shown 
in Figure im Coefficients oio, a2i, 032 where assumed to be sigmoid functions of 
membrane potential, and coefficients 043 and ao4 - constant. In the case of the 
sodium channel, s = 3 was used as a high permeability state, and s = 4 was used 
as inactive state. In the case of potassium channel, s = 3 and s = 4 were assumed 
to be high permeability states. 



3.3 Concept of an Ensemble of Protein Molecule Machines 
(EPMM) 

DEFINITION. An Ensemble of Protein Molecule Machines (EPMM) is a set of 
identical independent PMMs with the same input vector, and the output vector 
equal to the sum of output vectors of individual PMMs. The structure of an 
EPMM is shown in Figure ^1 where is the total number of PMMs, is the 
output vector of the k-th PMM, and y is the output vector of the EPMM. We 
have 

k=l 

Let Ni denote the number of PMMs in state s = i (the occupation number of 





u y 

Figure 12: The structure of EPMM 
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state i). Instead of (0) we can write 

n-l 

y = J2NMx,Si) (10) 

i=0 

Ni {i = 0, ...n — 1) are random variables with the binomial probability distributions 

(11) 



m 



Ni has the mean /ij = NPi and the variance af = NPi{l — Pi). 

Let us define the relative number of PMMs in state s = i (the relative occupation 
number of state i) as 



N 



(12) 



The behavior of the average is described by the equations similar to dH) 
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Figure 13: E-states as the numbers of PMM's in different states 
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and p. 

^ = ^ a{x, Si, Sj)ej - Ci ^ a{x, Sj, Si) (13) 

att = ^ei = l (14) 
The average output y is equal to the sum of average outputs for all states. 

n-l 

y = N'^uj{x,Si)ei (15) 

1=0 

The standard deviation for is equal to 

ak = ^JPk{l-Pk)/N (16) 

Figure ^1 illustrates the implementation of E-states as relative occupation 
numbers of the states of a PMM. The maximum number of independent E-state 
variables is equal to n — 1. The number is reduced by one because of the additional 
equation (fT^ . 

3.4 EPMM as a Robust Mixed-Signal Computer 

An EPMM can serve as a robust analog computer with the input- controlled coef- 
ficient matrix shown in Figure El Because some coefficients in this matrix can 



Input-controlled 
coefficient matrix 




Figure 14: EPMM as an analog computer with an input-controlled coefficient 
matrix 
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change sharply (almost step- wise) as functions of inputs (e.g., the membrane po- 
tential), an EPMM can be better characterized as a mixed-signal computer. The 
statistical molecular implementation of this computer is extremely robust, since 
all the characteristics of the whole computer are determined by the properties of 
a single PMM. 

It is interesting to emphasize that the matrix of input dependent coefficients 
is implemented as the matrix of input dependent probabilities, so no external 
connections are needed. It would be very difficult (if at all possible) to reach this 
level of microminiaturization and this level of reliability using traditional VLSI 
techniques. 

3.5 On conformational dynamics and chemical kinetics 

When a neural modeler needs to simulate different effects of cellular STM, he/she 
usually assumes that these effects are associated with chemical kinetics and/or 
with the accumulation of different neurotransmitters and/or ions in different cel- 
lular compartments. This approach to cellular STM encounters serious problems: 

1. It is difficult to justify sufficiently big time constants - chemical kinetics is 
quite fast, and cellular compartments are very small. 

2. It is difficult to justify nontrivial nonlinearities. For example, it is difficult 
to get different time constants for increase (charge) and decrease (discharge) 
of an STM variable. 

3. It is difficult (if not impossible) to get nontrivial timing effects, e.g., different 
results for different order of input events. 

All these possibilities are readily available with the EPMM formalism that deals 
with sophisticated conformational dynamics rather than with a relatively simple 
chemical kinetics. 

IMPORTANT! To avoid common misunderstanding, I want to emphasize that 
conformational dynamics has nothing to do with traditional chemical kinetics. 
Conformational dynamics is determined by the biophysical properties of protein 
molecules. No chemistry is involved, for example, in the case of voltage controlled 
channels. Even in the case of ligand controlled channels or enzymes it is inadequate 
to think about the interaction between a neurotransmitter molecule and a protein 
molecule as a chemical reaction. Protein molecules are very big (> 50,000 Dal- 
ton), whereas neurotransmitter molecules are tiny (<100 Dalton). A tiny molecule 
changes the conformation of a big molecule, so the latter can temporarily open its 
pore (as in the case of an ion channel) or become a catalyst producing a second 
messenger. (See, for example, Changeux, 1993, and Hille, 2001.) 
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Figure 15: Two EPMMs interacting via (a) electrical and (b) chemical messages 

3.6 What can be computed with EPMM's? 

Very little is known about the properties of different membrane proteins to rep- 
resent them as abstract probabilistic nanomachines. The best studied are the 
sodium and potassium channels used in the classical Hodgkin and Huxley (1952) 
model for the generation of nerve spike. It is believed that these protein molecules 
have close to five different states each. In this specific case, the EPPM formalism 
gives a good approximation of the available experimental data (Eliashberg, 1990a, 
2003). Therefore, it seems reasonable to believe that this formalism should work 
well in many other less studied cases. 

A single neuron can have several different EPMMs interacting via electrical 
messages (membrane potential) and chemical messages (different kinds of neuro- 
transmitters). As mentioned in Section [3.21 the Hodgkin- Huxley (1952) model 
can be naturally expressed in terms of two EPMMs (corresponding to the sodium 
and potassium channels) interacting via common membrane potential (see Figure 
IT^). Figure IT^ shows two EPMMs interacting via a second messenger. In this 
example, EPMMl is the primary transmitter receptor and EPMM2 is the second 
messenger receptor. 

Some examples illustrating nontrivial computational possibilities of the EPMM 
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formalism will be discussed in Part II of this paper. 

3.7 The main statements 

1. The whole human brain is a nonclassical symbolic system - an E- machine 
(Ehashberg, 1967, 1979). 

2. The popular notion that the brain implements multidimensional real map- 
ping is a fallacy. The whole concept of a learning algorithm that optimizes 
synaptic weights to create the above mappings is largely irrelevant to the 
problem of human learning. 

3. The main data storage procedure of the human brain must be universal 
- close to "memorizing raw experience." Instead of processing data before 
placing it in memory, the brain must process "raw" data dynamically (on 
the fly) depending on context. No context-dependent statistics can be pre- 
calculated in advance, in principle, because the number of possible contexts 
explodes combinatorially. 

4. Biological neural networks have the right computational resources to im- 
plement the above dynamic approach. The main computational engine of 
the brain is the statistical mechanics of protein nanomachines rather than 
the "statistical mechanics of neural networks." The notion of a neuron as 
a simple atomic computing element, employed by the latter approach, is 
inconsistent with the available neurobiological data (Kandel and Spenser, 
1968; Kandel, Jessel, and Schwartz, 2000; Nichols, Martin, Wallace, 1992, 
Byrne, 1987). 
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